Object detectors are conventionally trained by a weighted sum of classification and localization losses. Recent studies (e.g., predicting IoU with an auxiliary head, Generalized Focal Loss, Rank & Sort Loss) have shown that forcing these two loss terms to interact with each other in non-conventional ways creates a useful inductive bias and improves performance. Inspired by these works, we focus on the correlation between classification and localization and make two main contributions: (i) We provide an analysis about the effects of correlation between classification and localization tasks in object detectors. We identify why correlation affects the performance of various NMS-based and NMS-free detectors, and we devise measures to evaluate the effect of correlation and use them to analyze common detectors. (ii) Motivated by our observations, e.g., that NMS-free detectors can also benefit from correlation, we propose Correlation Loss, a novel plug-in loss function that improves the performance of various object detectors by directly optimizing correlation coefficients: E.g., Correlation Loss on Sparse R-CNN, an NMS-free method, yields 1.6 AP gain on COCO and 1.8 AP gain on Cityscapes dataset. Our best model on Sparse R-CNN reaches 51.0 AP without test-time augmentation on COCO test-dev, reaching state-of-the-art. Code is available at https://github.com/fehmikahraman/CorrLoss
translated by 谷歌翻译
尽管广泛用作可视检测任务的性能措施,但平均精度(AP)In(i)的限制在反映了本地化质量,(ii)对其计算的设计选择的鲁棒性以及其对输出的适用性没有信心分数。 Panoptic质量(PQ),提出评估Panoptic Seationation(Kirillov等,2019)的措施,不会遭受这些限制,而是限于Panoptic Seationation。在本文中,我们提出了基于其本地化和分类质量的视觉检测器的平均匹配误差,提出了定位召回精度(LRP)误差。 LRP错误,最初仅为Oksuz等人进行对象检测。 (2018),不遭受上述限制,适用于所有视觉检测任务。我们还介绍了最佳LRP(OLRP)错误,因为通过置信区获得的最小LRP错误以评估视觉检测器并获得部署的最佳阈值。我们提供对AP和PQ的LRP误差的详细比较分析,并使用七个可视检测任务(即对象检测,关键点检测,实例分割,Panoptic分段,视觉关系检测,使用近100个最先进的视觉检测器零拍摄检测和广义零拍摄检测)使用10个数据集来统一地显示LRP误差提供比其对应物更丰富和更辨别的信息。可用的代码:https://github.com/kemaloksuz/lrp-error
translated by 谷歌翻译
Development of guidance, navigation and control frameworks/algorithms for swarms attracted significant attention in recent years. That being said, algorithms for planning swarm allocations/trajectories for engaging with enemy swarms is largely an understudied problem. Although small-scale scenarios can be addressed with tools from differential game theory, existing approaches fail to scale for large-scale multi-agent pursuit evasion (PE) scenarios. In this work, we propose a reinforcement learning (RL) based framework to decompose to large-scale swarm engagement problems into a number of independent multi-agent pursuit-evasion games. We simulate a variety of multi-agent PE scenarios, where finite time capture is guaranteed under certain conditions. The calculated PE statistics are provided as a reward signal to the high level allocation layer, which uses an RL algorithm to allocate controlled swarm units to eliminate enemy swarm units with maximum efficiency. We verify our approach in large-scale swarm-to-swarm engagement simulations.
translated by 谷歌翻译
In this paper, we aim to address the large domain gap between high-resolution face images, e.g., from professional portrait photography, and low-quality surveillance images, e.g., from security cameras. Establishing an identity match between disparate sources like this is a classical surveillance face identification scenario, which continues to be a challenging problem for modern face recognition techniques. To that end, we propose a method that combines face super-resolution, resolution matching, and multi-scale template accumulation to reliably recognize faces from long-range surveillance footage, including from low quality sources. The proposed approach does not require training or fine-tuning on the target dataset of real surveillance images. Extensive experiments show that our proposed method is able to outperform even existing methods fine-tuned to the SCFace dataset.
translated by 谷歌翻译
The emergence of COVID-19 has had a global and profound impact, not only on society as a whole, but also on the lives of individuals. Various prevention measures were introduced around the world to limit the transmission of the disease, including face masks, mandates for social distancing and regular disinfection in public spaces, and the use of screening applications. These developments also triggered the need for novel and improved computer vision techniques capable of (i) providing support to the prevention measures through an automated analysis of visual data, on the one hand, and (ii) facilitating normal operation of existing vision-based services, such as biometric authentication schemes, on the other. Especially important here, are computer vision techniques that focus on the analysis of people and faces in visual data and have been affected the most by the partial occlusions introduced by the mandates for facial masks. Such computer vision based human analysis techniques include face and face-mask detection approaches, face recognition techniques, crowd counting solutions, age and expression estimation procedures, models for detecting face-hand interactions and many others, and have seen considerable attention over recent years. The goal of this survey is to provide an introduction to the problems induced by COVID-19 into such research and to present a comprehensive review of the work done in the computer vision based human analysis field. Particular attention is paid to the impact of facial masks on the performance of various methods and recent solutions to mitigate this problem. Additionally, a detailed review of existing datasets useful for the development and evaluation of methods for COVID-19 related applications is also provided. Finally, to help advance the field further, a discussion on the main open challenges and future research direction is given.
translated by 谷歌翻译
在未知环境中存在动态障碍的情况下,避免碰撞是无人系统最关键的挑战之一。在本文中,我们提出了一种方法,该方法可以鉴定出椭圆形的障碍,以估计线性和角度障碍速度。我们提出的方法是基于任何对象的概念,可以由椭圆形表示。为了实现这一目标,我们提出了一种基于高斯混合模型,kyachiyan算法和改进算法的变异贝叶斯估计的方法。与现有的基于优化的方法不同,我们提出的方法不需要了解集群数量,并且可以实时操作。此外,我们定义一个基于椭圆形的特征向量以匹配两个及时的接近点帧。我们的方法可以应用于具有静态和动态障碍的任何环境,包括具有旋转障碍的环境。我们将算法与其他聚类方法进行比较,并表明当与轨迹计划器结合时,整体系统可以在存在动态障碍物的情况下有效地穿越未知环境。
translated by 谷歌翻译
尽管机器学习方法在其培训领域表现良好,但通常在现实世界中往往会失败。在心血管磁共振成像(CMR)中,呼吸运动代表了采集质量以及随后的分析和最终诊断的主要挑战。我们提出了一个工作流程,该工作流程预测CMRXMOTION挑战2022的CMR中呼吸运动的严重程度得分。这是技术人员在获取过程中立即提供有关CMR质量的反馈的重要工具,因为可以直接重新获得质量较差的图像,同时还可以重新获得质量。该患者在附近仍有可用。因此,我们的方法可确保获得的CMR在用于进一步诊断之前达到特定的质量标准。因此,在严重运动人工制品的情况下,它可以有效地进行适当诊断的有效基础。结合我们的细分模型,这可以通过提供完整的管道来保证适当的质量评估和对心血管扫描的真实细分来帮助心脏病专家和技术人员的日常工作。代码库可在https://github.com/meclabtuda/qa_med_data/tree/dev_qa_cmrxmotion获得。
translated by 谷歌翻译
为了在医学成像研究中保持标准,图像应具有必要的图像质量,以进行潜在的诊断使用。尽管基于CNN的方法用于评估图像质量,但仍可以从准确性方面提高其性能。在这项工作中,我们通过使用SWIN Transformer来解决此问题,这改善了导致医疗图像质量降解的质量质量差分类性能。我们在胸部X射线(Object-CXR)和心脏MRI上的左心室流出路分类问题(LVOT)上测试了胸部X射线(Object-CXR)和左心室流出路分类问题的方法。虽然我们在Object-CXR和LVOT数据集中获得了87.1%和95.48%的分类精度,但我们的实验结果表明,SWIN Transformer的使用可以改善对象CXR分类性能,同时获得LVOT数据集的可比性能。据我们所知,我们的研究是医学图像质量评估的第一个Vision Transformer应用程序。
translated by 谷歌翻译
这项工作总结了2022年2022年国际生物识别联合会议(IJCB 2022)的IJCB被遮挡的面部识别竞赛(IJCB-OCFR-2022)。OCFR-2022从学术界吸引了总共3支参与的团队。最终,提交了六个有效的意见书,然后由组织者评估。在严重的面部阻塞面前,举行了竞争是为了应对面部识别的挑战。参与者可以自由使用任何培训数据,并且通过使用众所周知的数据集构成面部图像的部分来构建测试数据。提交的解决方案提出了创新,并以所考虑的基线表现出色。这项竞争的主要输出是具有挑战性,现实,多样化且公开可用的遮挡面部识别基准,并具有明确的评估协议。
translated by 谷歌翻译
深度学习(DL)在无线领域中找到了丰富的应用,以提高频谱意识。通常,DL模型要么是根据统计分布后随机初始初始初始初始初始初始初始初始初始初始化,要么在其他数据域(例如计算机视觉)(以转移学习的形式)上进行鉴定,而无需考虑无线信号的唯一特征。即使只有有限的带有标签的培训数据样本,自我监督的学习也能够从射频(RF)信号本身中学习有用的表示形式。我们通过专门制定一组转换以捕获无线信号特征来提出第一个自我监督的RF信号表示学习模型,并将其应用于自动调制识别(AMR)任务。我们表明,通过学习信号表示具有自我监督的学习,可以显着提高样本效率(实现一定准确性性能所需的标记样品数量)。这转化为大量时间和节省成本。此外,与最先进的DL方法相比,自我监管的学习可以提高模型的准确性,即使使用了一小部分训练数据样本,也可以保持高精度。
translated by 谷歌翻译